Skip to content

(wip) feat(tables): in-process Iceberg REST Catalog adapter#607

Draft
mkuchenbecker wants to merge 1 commit into
linkedin:mainfrom
mkuchenbecker:mkuchenb/iceberg-rest-adapter
Draft

(wip) feat(tables): in-process Iceberg REST Catalog adapter#607
mkuchenbecker wants to merge 1 commit into
linkedin:mainfrom
mkuchenbecker:mkuchenb/iceberg-rest-adapter

Conversation

@mkuchenbecker
Copy link
Copy Markdown
Collaborator

Summary

Adds an in-process Iceberg REST Catalog facade in front of the existing Tables Service. The new com.linkedin.openhouse.tables.rest package is picked up by the existing TablesSpringApplication component scan — no Spring-app wiring changes required. Any client that speaks the Apache Iceberg REST wire protocol (Spark, Trino, PyIceberg, Flink, …) can now read and write OpenHouse tables without an OpenHouse-specific plugin.

The new package contributes ~960 lines of new Java, zero changes to existing files. It reuses TablesApiHandler, IcebergSnapshotsApiHandler, DatabasesApiHandler, and OpenHouseInternalCatalog via constructor injection. Server-side metadata authorship and the existing two-stage CAS (path-string version check + HouseTables @Version JPA lock) are preserved unchanged — REST clients reach the same OpenHouseInternalTableOperations.doCommit path the existing OpenHouse Java client already uses.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

Client-facing API: new endpoints under /iceberg/v1/...

All endpoints accept and return Iceberg's standard REST wire format.

Method Path Backed by
GET /v1/config static — echoes ?warehouse= back as overrides.prefix
GET /v1/{prefix}/namespaces DatabasesApiHandler.getAllDatabases
POST /v1/{prefix}/namespaces accepted as success (OpenHouse auto-creates on first table)
GET /v1/{prefix}/namespaces/{namespace} existence check via DatabasesApiHandler
HEAD /v1/{prefix}/namespaces/{namespace} same
DELETE /v1/{prefix}/namespaces/{namespace} accepted as 204 (no native drop-namespace)
GET /v1/{prefix}/namespaces/{namespace}/tables TablesApiHandler.searchTables
POST /v1/{prefix}/namespaces/{namespace}/tables TablesApiHandler.createTable
GET /v1/{prefix}/namespaces/{namespace}/tables/{table} TablesApiHandler.getTable + OpenHouseInternalCatalog.loadTable
HEAD /v1/{prefix}/namespaces/{namespace}/tables/{table} same
POST /v1/{prefix}/namespaces/{namespace}/tables/{table} (commit) IcebergSnapshotsApiHandler.putIcebergSnapshots or TablesApiHandler.updateTable
DELETE /v1/{prefix}/namespaces/{namespace}/tables/{table} TablesApiHandler.deleteTable

The commit endpoint replays the Iceberg requirements + updates payload via MetadataUpdate.applyTo(TableMetadata.Builder), pre-checks each UpdateRequirement, then discriminates: snapshot changes route to IcebergSnapshotsApiHandler.putIcebergSnapshots; metadata-only commits route to TablesApiHandler.updateTable.

New Features

Lets any stock Iceberg REST client (Spark org.apache.iceberg.rest.RESTCatalog, PyIceberg RestCatalog, Trino iceberg-rest connector, Flink) talk to OpenHouse without per-engine catalog code.

A @RestControllerAdvice(basePackages = "com.linkedin.openhouse.tables.rest") maps OpenHouse internal exceptions to Iceberg's wire-format ErrorResponse JSON. The advice is package-scoped so OpenHouse's existing exception handler for the native /v1/databases/... surface is unaffected.

MVP scope notes (intentional)

  • Single-level namespaces only (depth > 1 → 400). Rejection chosen over flatten-encoding so a future multi-level migration is purely additive (no HDFS path rewrites). Spark, Trino, and PyIceberg all work depth-1 when the warehouse is configured that way.
  • Out of scope for this PR: views, multi-table transactions, server-side scan planning, credential vending, remote signing. These are simply not advertised — clients gracefully skip them.
  • No new external dependencies — Iceberg wire types (UpdateTableRequest, LoadTableResponse, ConfigResponse, ErrorResponse, …) come from iceberg-core 1.5.2 already on the Tables Service classpath.

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

End-to-end smoke against the oh-hadoop-spark docker recipe. Stock Iceberg RESTCatalog client; no OpenHouse plugin activated.

```bash
./gradlew :services:tables:bootJar
cd infra/recipes/docker-compose/oh-hadoop-spark
docker compose build openhouse-tables
docker compose up -d
```

Spark session config (catalog oh is stock Iceberg):

```
spark.sql.catalog.oh = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.oh.catalog-impl = org.apache.iceberg.rest.RESTCatalog
spark.sql.catalog.oh.uri = http://openhouse-tables:8080/iceberg/
spark.sql.catalog.oh.token =
spark.sql.catalog.oh.warehouse = oh
```

SQL script executed:

```sql
SHOW NAMESPACES IN oh;
CREATE NAMESPACE IF NOT EXISTS oh.smoke;
DROP TABLE IF EXISTS oh.smoke.t1;
CREATE TABLE oh.smoke.t1 (id bigint, name string) USING iceberg;
SHOW TABLES IN oh.smoke;
INSERT INTO oh.smoke.t1 VALUES (1,'alice'),(2,'bob'),(3,'carol');
SELECT count() FROM oh.smoke.t1;
SELECT * FROM oh.smoke.t1 ORDER BY id;
INSERT INTO oh.smoke.t1 VALUES (4,'dave');
SELECT count(
) FROM oh.smoke.t1;
SELECT * FROM oh.smoke.t1 ORDER BY id;
DROP TABLE oh.smoke.t1;
SHOW TABLES IN oh.smoke;
```

Result (trimmed):

```
smoke t1
Time taken: 0.595 seconds, Fetched 1 row(s)
Time taken: 6.497 seconds
3
1 alice
2 bob
3 carol
Time taken: 1.093 seconds, Fetched 3 row(s)
Time taken: 2.57 seconds
4
1 alice
2 bob
3 carol
4 dave
Time taken: 0.351 seconds, Fetched 4 row(s)
```

All commands succeed end-to-end. Spark uses the stock Iceberg RESTCatalog; the adapter translates each request, delegates to existing OpenHouse handlers, and translates the response back. The new metadata.json files are written server-side by OpenHouseInternalTableOperations.doCommit (unchanged), and Spark reads them back via the standard Iceberg path.

Follow-up work intentionally not in this PR:

  • Unit + Spring @WebMvcTest coverage for each controller
  • @SpringBootTest with the upstream RESTCatalog Java client against an H2 Tables Service
  • CI smoke that runs the docker SQL round-trip on every PR
  • Configuration knob to enable/disable the adapter (currently always on)
  • Credential vending (LoadTableResponse.storage-credentials)
  • Multi-table transactions, views, multi-level namespaces

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

No breaking changes. The new package is additive; the existing /v1/databases/... surface is untouched. The new @RestControllerAdvice is scoped via basePackages = "com.linkedin.openhouse.tables.rest" so OpenHouse's existing exception handler keeps owning everything else.

Adds an in-process Iceberg REST Catalog facade in front of the Tables Service.
The new `com.linkedin.openhouse.tables.rest` package is picked up by the
existing `TablesSpringApplication` component scan; no Spring-app wiring
changes are required.

Endpoints (all under `/iceberg/v1/...`):
  GET    /v1/config
  GET    /v1/{prefix}/namespaces
  POST   /v1/{prefix}/namespaces
  GET    /v1/{prefix}/namespaces/{namespace}
  HEAD   /v1/{prefix}/namespaces/{namespace}
  DELETE /v1/{prefix}/namespaces/{namespace}
  GET    /v1/{prefix}/namespaces/{namespace}/tables
  POST   /v1/{prefix}/namespaces/{namespace}/tables
  GET    /v1/{prefix}/namespaces/{namespace}/tables/{table}
  HEAD   /v1/{prefix}/namespaces/{namespace}/tables/{table}
  POST   /v1/{prefix}/namespaces/{namespace}/tables/{table}
  DELETE /v1/{prefix}/namespaces/{namespace}/tables/{table}

The commit endpoint replays the Iceberg `requirements + updates` payload via
`MetadataUpdate.applyTo(TableMetadata.Builder)`, then discriminates between
snapshot commits (route to `IcebergSnapshotsApiHandler.putIcebergSnapshots`)
and metadata-only commits (route to `TablesApiHandler.updateTable`).
Server-side metadata authorship and the existing two-stage CAS (path-string
version check plus HouseTables `@Version` JPA lock) are preserved unchanged:
REST clients reach the same `OpenHouseInternalTableOperations.doCommit` path
that OpenHouse's Java client already uses.

MVP scope:
- single-level namespaces (Iceberg-spec depth > 1 -> 400 BadRequest); rejection
  chosen over flatten-encoding so a future multi-level migration is purely
  additive and does not require HDFS path rewrites.
- no views, no multi-table transactions, no scan planning, no credential
  vending, no remote signing. Out-of-spec features are simply not advertised.
- depends on iceberg-core 1.5.2 wire types (`UpdateTableRequest`,
  `LoadTableResponse`, `ConfigResponse`, `ErrorResponse`, ...) already on the
  Tables Service classpath; no new external dependencies.

A `@RestControllerAdvice(basePackages = "com.linkedin.openhouse.tables.rest")`
maps OpenHouse exceptions (`NoSuchUserTableException`, `AlreadyExistsException`,
`EntityConcurrentModificationException`, ...) to Iceberg's wire-format
`ErrorResponse`. The advice is scoped to the new package so OpenHouse's
existing exception handler for the native `/v1/databases/...` surface is
unaffected.

Smoke-tested end-to-end against the `oh-hadoop-spark` docker recipe: a Spark
3.1 spark-sql session configured with stock `org.apache.iceberg.spark.SparkCatalog`
+ `catalog-impl = org.apache.iceberg.rest.RESTCatalog` (no OpenHouse plugin
activated) successfully runs CREATE NAMESPACE, CREATE TABLE, INSERT, SELECT,
DROP TABLE round-trip against the Tables Service.
cbb330
cbb330 previously requested changes May 27, 2026
Copy link
Copy Markdown
Collaborator

@cbb330 cbb330 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets integrate with the existing work since it has been reviewed and has known good architecture. it is read side only, next set of changes should be write side

then database

@mkuchenbecker mkuchenbecker dismissed cbb330’s stale review May 27, 2026 22:37

This is not in-review

@mkuchenbecker mkuchenbecker changed the title feat(tables): in-process Iceberg REST Catalog adapter (wip) feat(tables): in-process Iceberg REST Catalog adapter May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants